Print this article

Analytics: Constructing accurate benchmarks (Part II)

Ronald Surz

3 October 2006

Common practice is not always best practice; we can and must do better. Ronald Surz is president of PPCA, a San Clemente, Calif.-based software firm that provides advanced performance-evaluation and attribution analytics, and a principal of RCG Capital Partners, a Denver, Colo.-based fund-of-hedge-funds manager. He is a prolific and widely published author.

Continued from Viewpoint: Constructing accurate benchmarks .

Peer groups

Peer groups place a portfolio's performance into perspective by ranking it against the performance of similar portfolios. Accordingly, performance for even a short period of time can be adjudged significant if it ranks in the top of the distribution. When traditional peer groups are used, the "Performance is good" hypothesis is tested by comparing performance with that of a group of portfolios that is presumably managed in a manner similar to the portfolio that is being evaluated, so the hypothesis is tested relative to the stock picks of similar professionals.

This makes sense -- provided someone defines "similar" and then collects data on the funds that fit this particular definition of similar. Each peer group provider has its own definitions and its own collection of funds, so each provider has a different sample for the same investment mandate. "Large-cap growth" is one set of funds in one provider's peer group, and another set of funds in the next provider's peer group. These sampling idiosyncrasies are the source of the following well-documented peer group biases.

Classification bias results from the practice of forcing every manager into a pre-specified pigeonhole, such as growth or value. It is now commonly understood that most managers employ a blend of styles, so that pigeonhole classifications misrepresent the manager's actual style as well as those employed by peers. Classification bias is the reason that a style index ranks well, outperforming the majority of managers in an associated style peer group, when that style is in favor. Conversely, the majority of managers in an out-of-favor style tend to outperform an associated index. Until recently it was believed that skillful managers excelled when their style was out of favor. However, research has shown that this phenomenon is a direct result of the fact that many managers in a given style peer group are not "style pure," and it is this impurity, or classification bias, that leads to success or failure versus the index.

The illustration below shows the effect of classification bias. The scatter charts RBSA to locate members of the Morningstar peer group in style space. As you can see, the tendency is for the funds to be somewhat similar, but significant compromises have been made.

|image1|

Classification bias is a boon to client-relations personnel because there is always an easy target to beat. When your style is out of favor, you beat the index; when it's in favor, you beat the median.

Composition bias occurs because each peer group provider has its own set of fund data. This bias is particularly pronounced when a provider's database contains concentrations of certain fund types, such as bank commingled funds, and when it contains an insufficient number of funds. For example, international managers and socially responsible managers cannot be properly evaluated using peer groups because there are no databases of adequate size. Composition bias is the reason that managers frequently rank well in one peer group, but simultaneously rank poorly against a similar group of another provider, as Randall Eley, president and CIO of Springfield, Va.-based Edgar Lomax, shows in a 2004 article in Pensions &Investments.

|image2|

Don't like your ranking? Pick another peer group provider. It is frequently the case that a manager's performance result is judged to be both a success and a failure because the performance ranks differently in different peer groups for the same mandate, such as large cap value.

Survivorship bias is the best understood and most documented problem with peer groups. Survivor bias causes performance results to be overstated because defunct accounts, some of which may have underperformed, are no longer in the database. For example, an unsuccessful management product that was terminated in the past is excluded from current peer groups. This removal of losers results in an overstatement of past performance. A related bias is called "backfill bias," which results from managers withholding their performance data for new funds from peer group databases until an incubator period produces good performance. Both survivor and backfill biases raise the bar. A simple illustration of the way survivor bias skews results is provided by the "marathon analogy." Only 100 runners in a 1,000-contestant marathon actually finish. Is the 100th runner dead last or in the top 10%?

Peer group comparisons are more likely to mislead than to inform, and so they should be avoided. Given the wide use of peer group comparisons, we realize this position is an unpopular one. The fact is that sometimes common practice defies common sense. Try as we may, there is no way to make the biases described above go away. The most that can be done is to try to minimize the effects of these biases, which can best be accomplished with the approach described in the next section.

Unification

Let's summarize what we've covered so far. Custom blended indexes provide accurate benchmarks, but we have to wait decades to gain confidence in a manager's success at beating the benchmark. Peer groups don't have this "waiting problem," but are contaminated by myriad biases that render them useless. A solution to these problems is actually quite simple, at least in concept, but was only recently made practical when the requisite computing power became available. The solution uses custom benchmarks to create a peer group backdrop that does not have a waiting problem, that is, we know right away if a manager has significantly succeeded or failed.

As noted above, performance evaluation can be viewed as a hypothesis test that assesses the validity of the hypothesis "Performance is good." To accept or reject this hypothesis, we construct an approximation of all of the possible outcomes and determine where the actual performance result falls. This solution begins with identification of the best benchmark possible, like a custom index blend, and then expands this benchmark into a peer group by creating thousands of portfolios that could have been formed from stocks in the benchmark, following reasonable portfolio construction rules. This approach, illustrated in Exhibit 3, combines the better characteristics of both peer groups and indexes, while reducing the deficiencies of each.

|image3|

Statistical significance is determined much more quickly with this approach than with benchmarks because inferences are drawn in the cross-section rather than across time. In other words, the ranking of actual performance against all possible portfolios is a measure of statistical confidence.

Let's say the manager has underperformed the benchmark by 3%. Exhibit 4 shows that in a recent quarter this underperformance would have been significant if the S&P 500 were the benchmark, but not significant if the benchmark were the Russell 2000. We use 90% confidence as the breakpoint for declaring significance. Because they provide indications of significance very quickly, Monte Carlo simulations solve the waiting problem of benchmarks.

|image4|

There are two central questions in the due diligence process. "What does this manager do?" -- that is how and what does he manage -- and "Does the manager do this well?" The first question addresses the form of the investment, and the second identifies the substance, or skill. In this context, the benchmark provides the answer to the first question. The ranking within the manager's customized opportunity set answers the second question. Note that in properly constructed MCSs, the benchmark always ranks median. This provides for the interpretation of an MCS ranking as the "statistical distance" of return away from the benchmark.

The MCS approach has been used to evaluate traditional investing for more than a decade. MCS has yet to be accepted as standard practice, but this doesn't make it faulty. It took 30 years for Modern Portfolio Theory to gain wide acceptance. Further improving its potential for acceptance, MCS technology has been extended to hedge funds, where recognition of the fact that peer groups don't work for performance evaluation has lowered inherent barriers to adoption.

Hedge funds

The first question of due diligence -- "What does this manager do?" -- can be hard to answer in the hedge fund world. Here though, the old tenet about not investing in things you don't understand comes in handy. The beta of a specific hedge fund can be replicated with a long-short blend of passive portfolios such as exchange-traded funds. We shouldn't pay for beta, but its identification sets the stage for the second question regarding substance.

As with traditional long-only investing, MCSs provide the answer to the question of manager skill. In constructing a specific custom peer group, Monte Carlo simulations follow the same rules that individual hedge fund managers follow in constructing portfolios, going both long and short, following custom benchmark specifications on both sides, as well as using leverage, employing controls such as those shown in this illustration.

|image5|

An MCS approach addresses the unique challenge of evaluating hedge fund performance by randomly creating a broad representation of all of the possible portfolios that a manager could have conceivably held following his unique investment process, and so applies the scientific principles of modern statistics to the problem of performance evaluation. This solves the problem arising from the fact that members of hedge fund peer groups are uncorrelated with one another, which violates the central homogeneity principle of peer groups.

Some observers say it's good that the members of hedge fund peer groups are unlike one another, because this produces diversification benefits. While it may be good for portfolio construction, it's bad for performance evaluation. Comparing funds in hedge fund peer groups is like comparing apples and oranges. Hedge funds really do require not only custom MCS peer groups for accurate evaluation, but also custom benchmarks that show both the longs and shorts, thereby estimating the hedge fund's beta. A ranking in a hedge fund MCS universe renders both the alpha and its significance.

Attribution

Up to this point we have been discussing performance evaluation, which determines whether performance is good or bad. The next, and more crucial, question is "Why?" -- the role of performance attribution. Attribution is important because it is forward-looking, providing the investor with information for deciding if good performance is repeatable in the future. We want to know which sectors had good stock selection or favorable allocations and if the associated analysts are likely to continue providing these good results.

We also want to know what mistakes have been made and what is being done to avoid these mistakes in the future. These are important considerations that fortunately can be addressed with the same accurate, customized benchmark that we've described for use in performance evaluation.

This practice enables us to steer clear of the problem associated with more common attribution systems, i.e., the frequent disconnect between the benchmark used for evaluation and the one used for attribution. This disconnect is due to the fact that most performance attribution systems are currently limited to popular indexes and cannot accommodate custom benchmarks. This unfortunate limitation creates the very "garbage-in, garbage-ouy" problem we set out to avoid. We should not throw away all of our hard work in constructing an accurate benchmark when it comes to the important step of attribution.

Put another way, we shouldn't bother with attribution analyses if we can't customize the benchmark. We'll just spend a lot of time and money to be misled and misinformed.

Getting back to basics is more than just a good thing to do. Getting the benchmark right is a fiduciary imperative, an obligation. Even if you don't agree with this article's recommended best practices, you can't deny the failure of common practices. Something has to change. Current common practices are not best practices; we can and must do better.

The components of investment return as we understand them today are summarized in the accompanying graphic entitled "The Complete Performance Picture." The new element in this picture, beyond Modern Portfolio Theory, is indicated by the box labeled "Style Effects." MPT, which relies exclusively on market-related effects, has not worked as predicted because of the powerful influences of investment style. It's easy to confuse style with skill, but difficult to make good decisions once this mistake has been made.

|image6|

Accurate benchmarks are customized to each individual manager's style and should be used for both performance evaluation and performance attribution. Monte Carlo simulations expand these custom benchmarks into accurate and fair universes, similar to peer groups but without the biases, and provide indications of significance very quickly. Both traditional and hedge fund managers are best reviewed with these techniques. -FWR

.